Automatic medieval charters structure detection : A Bi-LSTM linear segmentation approach
نویسندگان
چکیده
This paper presents a model aiming to automatically detect sections in medieval Latin charters. These legal sources are some of the most important for studies as they reflect economic and social dynamics well institutional writing practices. An automatic linear segmentation can greatly facilitate charter indexation speed up recovering evidence support historical hypothesis by means granular inquiries on these raw, rarely structured sources. Our is based Bi-LSTM approach using final CRF-layer was trained large, annotated collection charters (4,700 documents) coming from Lombard monasteries: CDLM corpus (11th-12th centuries). The evaluation shows high performance test-set an external consisting Montecassino abbey (10th-12th We describe architecture model, main problems related treatment formulaic discourse, we discuss implications results terms record-keeping practices High Middle Ages.
منابع مشابه
Automatic Writer Identification in Medieval Papal Charters
Automatic writer identification and writer verification has recently received significant attention in the field of historical analysis. In this work a short overview of current approaches for writer identification is given. Current state-of-the-art results on contemporary data are related to different approaches for writer verification on a small dataset of datum lines extracted from papal cha...
متن کاملDating medieval English charters
Deeds, or charters, dealing with property rights, provide a continuous documentation which can be used by historians to study the evolution of social, economic and political changes. This study is concerned with charters (written in Latin) dating from the tenth through early fourteenth centuries in England. Of these, at least one million were left undated, largely due to administrative changes ...
متن کاملChallenges in Annotating Medieval Latin Charters
No annotation guidelines concerning substandard Latin are presently available. This paper describes an annotation style of substandard Latin that supplements the method designed for standard Latin by the Perseus Latin Dependency Treebank and the Index Thomisticus Treebank. Each word of the corpus can be assigned only one morphological analysis. In our system, the analysis can be either function...
متن کاملBi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation
Recurrent neural network(RNN) has been broadly applied to natural language processing(NLP) problems. This kind of neural network is designed for modeling sequential data and has been testified to be quite efficient in sequential tagging tasks. In this paper, we propose to use bi-directional RNN with long short-term memory(LSTM) units for Chinese word segmentation, which is a crucial preprocess ...
متن کاملArabic Multi-Dialect Segmentation: bi-LSTM-CRF vs. SVM
Arabic word segmentation is essential for a variety of NLP applications such as machine translation and information retrieval. Segmentation entails breaking words into their constituent stems, affixes and clitics. In this paper, we compare two approaches for segmenting four major Arabic dialects using only several thousand training examples for each dialect. The two approaches involve posing th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Data Mining and Digital Humanities
سال: 2022
ISSN: ['2416-5999']
DOI: https://doi.org/10.46298/jdmdh.8646